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Abstract: 

ZgpA-i is an engineered protein that binds to its parent, the three-helix-bundle Z 
domain of staphylococcal protein A. Uncomplexed Z SP a_i shows a reduced helix 
content and a melting behavior that is less cooperative, compared with the wild-type 
Z domain. Here we show that the difference in folding behavior between these two 
sequences can be partly understood in terms of an off-lattice model with 5-6 atoms 
per amino acid and a minimalistic potential, in which folding is driven by backbone 
hydrogen bonding and effective hydrophobic attraction. 
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1 Introduction 



It is becoming increasingly clear that unstructured proteins play an important biolog- 
ical role PQI21- in many cases, such proteins adopt a specific structure upon binding 
to their biological targets. Recently, it was demonstrated that the in vitro evolved 
Zspa-i protein [3] exhibits coupled folding and binding jlj. 

ZgpA-i is derived from the Z domain of staphylococcal protein A, a 58-amino acid, 
well characterized [3] three-helix-bundle protein. Zspa-i was engineered 3J by ran- 
domizing 13 amino acid positions and selecting for binding to the Z domain itself. 
Subsequently, the structure of the Z:Zspa-i complex was determined both in solu- 
tion [1] and by crystallography [6 . In the complex, both Zspa-i and the Z domain 
adopt structures similar to the solution structure of the Z domain. However, in 
solution, Z S pa_i does not behave as the Z domain; Wahlberg et al. [I] found that 
uncomplexed ZgpA_i lacks a well-defined structure, and that its melting behavior is 
less cooperative than that of the Z domain. 

The Z domain is a close analog of the B domain of protein A, a chain that is known to 
show two-state folding without any meta-stable intermediate state j7J|H] . The folding 
behavior of the B domain has been studied theoretically by many different groups, 
including ourselves, using both all-atom [H1HH1HIJH2I anc ^ reduced [TB ^ IT H IT ^ ITT H ITT] 
models. In many cases, it was possible to fold this chain, but to achieve that most 
models rely on the so-called Go prescription [TH]. Our model [T7j is, by contrast, 
sequence-based. This makes it possible for us to study both Zspa-i and the wild- 
type Z domain and compare their behaviors, using one and the same model. 

The purpose of this note is twofold. First, we check whether our model can explain the 
difference in melting behavior between Zspa-i and the wild-type sequence. Second, 
using this model, we study the structure of Zspa-i- 



2 Materials and Methods 



2.1 Model 

The model we study is an extension of a model with three amino acids [T§ll2U[l21j 
to a five-letter alphabet. The five amino acid types are hydrophobic (Hyd), polar 
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(Pol), Ala, Pro and Gly. Hyd, Pol and Ala share the same geometric representation 
but differ in hydrophobicity. Pro and Gly have their own geometric representations. 

The Hyd, Pol and Ala representation contains six atoms. The three backbone atoms 
N, C a and C and the H and O atoms of the peptide unit are all included. The H 
and O atoms are used to define hydrogen bonds. The sixth atom is a large Cp that 
represents the side chain. Gly lacks the Cp atom but is otherwise the same. The 
representation of Pro differs from that of Hyd, Pol and Ala in that the H atom is 
replaced by a side-chain atom, C$, and that the Ramachandran angle is held fixed 
at -65°. 

The degrees of freedom of our model are the Ramachandran torsion angles and 
with the exception that is held fixed for Pro. All bond lengths, bond angles and 
peptide torsion angles (180°) are held fixed. 

The interaction potential 

E = E Xoc + E ev + E hh + E hp (1) 

is composed of four terms. The first term is a local 0, ip potential. The other three 
terms represent excluded volume, backbone hydrogen bonds and effective hydropho- 
bic attraction, respectively (no explicit water). For simplicity, the hydrophobicity 
potential is taken to be pairwise additive. Only Hyd-Hyd and Hyd-Ala Cp pairs 
experience this type of interaction. In particular, this means that Ala is intermediate 
in hydrophobicity between Hyd and Pol. The amino acids in the Hyd class are Val, 
Leu, He, Phe, Trp and Met, whereas those in the Pol class are Arg, Asn, Asp, Cys, 
Gin, Glu, His, Lys, Ser, Thr and Tyr. A complete description of the model, including 
numerical values of all the parameters, can be found in our earlier study |17j . 

In this earlier study, the model was applied to the 10-55-amino acid fragment of the 
B domain of protein A. Despite the simplicity of the potential, this sequence was 
found to have the following properties [IT] in the model: 

• It does make a three-helix bundle with the native three-helix-bundle topology,^ 
although the suppression of the wrong topology is not very strong. All helices 
are right-handed, as they should. 

• Energy minimization restricted to the thermodynamically favored (native) topol- 
ogy gives a structure with a root-mean-square deviation (RMSD) of 1.8 A from 
the NMR structure [22] (calculated over all backbone atoms). 

t There are two possible three-helix-bundle topologies; if we let the first two helices form a U, 
then the third helix can be cither in front of or behind this U. 
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Z QQN AFY EIL HLP NLN EEQ RNA FIQ SLK 

Z S pa-i LSV AGR EIV TLP NLN DPQ KKA FIF SLW 

Table 1: Amino acids 9 to 35 for ZgpA-i and the wild-type Z domain. 

• The collapse transition is much more cooperative for this sequence than for 
random sequences with the same composition. Moreover, chain collapse and 
helix formation occur at approximately the same temperature. 

The relative order of chain collapse and helix formation depends strongly on the 
relative strength of the hydrogen bonds and the hydrophobic attraction, so the last 
conclusion may seem somewhat arbitrary. However, the chain does not fold to a heli- 
cal bundle if the hydrogen bonds are too strong, and it does not fold in a cooperative 
manner if the hydrogen bonds are too weak [20 . As a result, with our ansatz for 
the potential, there is not much freedom left in the choice of these parameters, if the 
chain is to fold to a compact helical bundle in a cooperative manner. 

In the present study, we apply the same model, with unchanged parameters, to 
Zspa-i and the Z domain of protein A. Following previous calculations for the B 
domain [9~ lll()pi2pi3[ ll4 [ll5pi6pi7| , we consider the 9-54-amino acid fragments of these 
two sequences (corresponding to the 10-55-amino acid fragment of the B domain). 
It should be mentioned that we also performed calculations for the 4-54-amino acid 
fragments of Zspa-i and the Z domain, with similar results. 

The amino acid sequences of ZgpA_i and the Z domain differ at 13 positions, all of 
which are located in the section 9-35. Table I shows this part of the sequences. 



2.2 Numerical Methods 

To simulate the thermodynamic behavior of this model, we use simulated temper- 
ing |2*3*l l24j. in which the temperature is a dynamic variable. Details on our imple- 
mentation of this method can be found elsewhere [22] • For a review of simulated 
tempering and other generalized-ensemble techniques, see |26| . 

In conformation space we use two different elementary moves: first, the pivot move in 
which a single torsion angle is turned; and second, a semi-local method [2Z] that works 
with seven or eight adjacent torsion angles, which are turned in a coordinated manner. 
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The non-local pivot move is included in our calculations in order to accelerate the 
evolution of the system at high temperature, whereas the semi-local method improves 
the performance at low temperature. 

Our simulations are started from random configurations. All statistical errors quoted 
are la errors obtained by analyzing data from eight independent runs. 

The temperatures studied range from 0.87T m to 1.43 T m , T m being the melting tem- 
perature for the wild-type Z domain. The experimental value of this temperature is 
T m = 75°C fl]. Hence, the lowest and highest temperatures studied correspond to 
31°C and 225°C, respectively. In the dimensionless energy unit used in our earlier 
study [Ej, T m is given by kT m = 0.630 ± 0.001, k being Boltzmann's constant. In 
the model we define T m as the maximum of the specific heat. 



3 Results and Discussion 



Using the model and methods described in the previous section, we study the 9-54- 
amino acid fragments of Zspa-i and the wild-type Z domain. The latter sequence 
differs only by a one-point mutation from the previously studied 10-55-amino acid 
fragment of the B domain. Our results for the Z domain are similar to those for the 
B domain [Tj\ summarized in the previous section. Figure Q shows the free energies 
F(A,E) and F(A,E hh ) for the Z domain at T = 0.87 T m , where A denotes RMSD 
from the NMR structure (PDB code 2SPZ, model 1). Two major minima can 
be seen, with similar hydrogen-bond energies. These minima correspond to the two 
possible three-helix-bundle topologies. Both topologies are significantly populated, 
but the average total energy is slightly lower for the native topology, and this topology 
is the thermodynamically favored one. We also performed an energy minimization 
for the native topology, by applying simulated annealing combined with a conjugate 
gradient method to a large number of low-temperature conformations. The minimum- 
energy structure obtained this way is schematically illustrated in Fig. |21 It has an 
RMSD of A = 1.7 A from the NMR structure. The corresponding result for the B 
domain was, as mentioned earlier, A = 1.8 A. 

Let us now compare the behavior of the Z domain with that of the engineered ZgpA-i 
sequence. By CD, Wahlberg et al. 4J found Zg P A_i to be less helical than the wild- 
type Z domain, the mean residue ellipticity for Z SP a_i being 60% of that for the wild- 
type sequence. Furthermore, they found that the helix formation sets in at a lower 
temperature and is less cooperative for the engineered sequence. Figure shows the 



5 




2 4 6 „ 8 10 12 2 4 6~[ 8 10 

A/A A/A 

Figure 1: Level diagrams showing the free energies (a) F(A, E) and (b) F(A, E^) 
for the 9-54-amino acid fragment of the Z domain at T = 0.87 T m . E is the total 
energy [see equation (JTJ)], E^ is the hydrogen-bond energy, and A denotes RMSD 
from the NMR structure. The separation between adjacent contour lines is 1 kT. The 
darkest regions correspond to 4 < F/kT < 5 and the white regions to F/kT > 8. 




Figure 2: Schematic illustration of the structure obtained by energy minimization 
restricted to the thermodynamically favored topology (see text) for the 9-54-amino 
acid fragment of the Z domain. 

helix content against temperature in our model, for both sequences.* In agreement 
with the experimental results, we find that Zspa-i has a lower helix content, and that 
the helix formation is shifted toward lower temperature for this sequence. Figure &p 
shows the temperature dependence of the radius of gyration. We find that Z SP a_i 

*We define helix content in the following way. Each amino acid, except the two at the ends, is 
labeled h if -90° < 4> < -30° and -77° < ip < -17°, and c otherwise. The two amino acids at the 
ends are labeled c. An amino acid is said to be helical if both the amino acid itself and its nearest 
neighbors are labeled h. The total number of helical amino acids is denoted by N^. The maximum 
value of ATjj is N — 4 for a chain with N amino acids. 
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Figure 3: Helix formation and chain collapse for the 46-amino acid fragments of 
ZgpA-i (dashed line) and the Z domain (full line), (a) The number of helical amino 
acids, iVh, against temperature, (b) The radius of gyration (calculated over all back- 
bone atoms), R s , against temperature. T m denotes the melting temperature for the 
Z domain. The NMR structure for the Z domain has iV h = 29 and FL = 9.0 A. 



is more compact than the Z domain. A comparison with Fig. shows that, in our 
model, chain collapse occurs before helix formation for Zspa-i- From Fig. El it can 
also be seen that the melting behavior is less cooperative for Zspa-i than for the Z 
domain. This conclusion is supported by our data for the specific heat (not shown); 
the peak in the specific heat is more pronounced for the Z domain than for ZgpA-i- 

That the model predicts Zspa-i to be more compact than the Z domain is not sur- 
prising, given that the number of hydrophobic amino acids is larger for Zspa-i (14) 
than for the Z domain (11). In addition, Zspa-i has one more Pro than the wild-type 
sequence, which does change the local properties of the chain and could affect the 
overall size, too. It should be pointed out that the effect of a Pro on the overall size 
may be poorly described by the model because all peptide bonds, also those preceding 
a Pro, are held fixed (trans). 

The reduced helix content of Z SP a_i shows that this sequence does not make a per- 
fect three-helix bundle, but does not tell how its structure differs from a three-helix 
bundle. It could be that one of the three helices is missing and that the other two 
are still there, but it could also be that the disorder is more uniform along the chain, 
so that all three helices are present but partially disordered. The NMR analysis of 
Zspa-i [3j does not exclude any of these two possibilities. 

Figure 0] shows how the helix content varies along the chains in our model. A com- 
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Figure 4: Helix content along the chain, h(i), for the 46-amino acid fragments of 
ZgpA-i (dashed line) and the Z domain (full line) at T = 0.87 T m , where T m is the 
melting temperature for the Z domain. h(i) denotes the probability that amino acid 
i is helical (for the definition of helical, see footnote). Thick horizontal lines indicate 
helical parts of the NMR structure [H] for the Z domain. 



parison with experimental data for the Z domain [5 shows that the first half of helix 
II is somewhat distorted in the model. As a result, it is possible that the model 
underestimates the structural change produced by the mutation Glu25Pro (see Ta- 
ble Q), which should have a helix-breaking effect. Our results for helices I and III 
of the Z domain are, by contrast, in good agreement with experimental data. These 
two helices respond very differently to the mutations leading to Z SP a_i. Our results 
suggest that helix III, which itself is free from mutations, remains stable in Zspa-i, 
whereas helix I, which contains seven mutations (see Table Q), turns unstable. Two 
possible explanations why the model predicts helix I to become unstable are that 
the hydrophobicity pattern of helix I is less helical in Zspa-i, and that one of the 
mutations, Phel3Gly, increases the flexibility of this part of the chain. 

To further investigate how the mutations affect different parts of the chain, we also 
perform an RMSD-based analysis. For each conformation, we compute two RMSD 
values, Ai and A 2 , for the first and second halves of the chain, respectively. The 
two parts of the chain are separately superimposed on the NMR structure. Figure El 
shows the probability distributions of A± and A 2 both for ZgpA-i and the Z domain. 
In line with the results in Fig. 01 we find that the two A 2 distributions are similar, 
although the distribution for Zspa-i is slightly wider. The two Ai distributions differ, 
by contrast, markedly, the mean being significantly higher for ZgpA_i than for the 
wild- type sequence. These results confirm that, in our model, the disorder of Z SP a-i 



8 




Figure 5: RMSD distributions for the 46-amino acid fragments of Zspa-i (dashed 
line) and the Z domain (full line), (a) The distribution of Ai (amino acids 9-31). (b) 
The distribution of A 2 (amino acids 32-54). Both A x and A 2 are backbone RMSDs. 
The temperature is the same as in Fig. 

is not uniformly distributed along the chain; the main difference between Zspa-i and 
the Z domain lies in the behavior of the first half of the chain. 



4 Conclusion 



Using a model that combines a relatively detailed chain representation with a simple 
interaction potential, we have studied the thermodynamic behaviors of an engineered 
sequence and its parent. The model is sequence-based, which makes it possible to 
compare the two sequences in a straightforward manner. Despite the simplicity of 
the potential, we found that the model is able to capture important effects of the 
mutations; the mutated sequence, Z SP a-i, shows a reduced helix content and a melt- 
ing behavior that is less cooperative, compared with the wild-type sequence. We also 
found that chain collapse occurs before helix formation sets in for Zspa-i, and that 
the main difference between the two sequences lies in the behavior of the first half 
of the chain, which is less stable for ZgpA-i- To decide whether or not these two 
predictions are correct requires further experimental data. 
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